Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bad text recognition because language in "detect text from picture" is hard-coded english (tesseract ocr) #30280

Open
Ragnaroek8 opened this issue May 13, 2024 · 0 comments
Labels
area/web interface Related to the Mastodon web interface bug Something isn't working status/to triage This issue needs to be triaged

Comments

@Ragnaroek8
Copy link

Steps to reproduce the problem

If you upload a picture to a toot on mastodon, there is the possibility to make a description of the picture with "detect text from picture". But if text in picture is others than english, the recognize results are very poor.

Expected behaviour

A good recognition of text in a picture

Actual behaviour

A poor recognition (except language is english)

Detailed description

This feature works with tesseract. Tesseract, that works with dictionaries, got by far best results, if the language to recognize fits to the language of the text in the picture. But in sourcecode there is english as language hard-coded.
At least the language should be set to the language of the client language, that the user has set. Or better should be choicable.

Mastodon instance

social.tchncs.de

Mastodon version

4.2.8

Browser name and version

Firefox 105.0.3

Operating system

Win 10

Technical details

The language settings ar in file:
focal_point_modal.jsx

await worker.loadLanguage('eng');
await worker.initialize('eng');
const { data: { text } } = await worker.recognize(media_url);
this.setState({ detecting: false });
@Ragnaroek8 Ragnaroek8 added area/web interface Related to the Mastodon web interface bug Something isn't working status/to triage This issue needs to be triaged labels May 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/web interface Related to the Mastodon web interface bug Something isn't working status/to triage This issue needs to be triaged
Projects
None yet
Development

No branches or pull requests

1 participant